ISIT 2015 Tutorial: Information Theory and Machine Learning
نویسندگان
چکیده
We are in the midst of a data deluge, with an explosion in the volume and richness of data sets in fields including social networks, biology, natural language processing, and computer vision, among others. In all of these areas, machine learning has been extraordinarily successful in providing tools and practical algorithms for extracting information from massive data sets (e.g., genetics, multi-spectral imaging, Google and FaceBook). Despite this tremendous practical success, relatively less attention has been paid to fundamental limits and tradeoffs, and information theory has a crucial role to play in this context. The goal of this tutorial is to demonstrate how information-theoretic techniques and concepts can be brought to bear on machine learning problems in unorthodox and fruitful ways. We discuss how any learning problem can be formalized in a Shannon-theoretic sense, albeit one that involves non-traditional notions of codewords and channels. This perspective allows information-theoretic tools—including information measures, Fano’s inequality, random coding arguments, and so on—to be brought to bear on learning problems. We illustrate this broad perspective with discussions of several learning problems, including sparse approximation, dimensionality reduction, graph recovery, clustering, and community detection. We emphasise recent results establishing the fundamental limits of graphical model learning and community detection. We also discuss the distinction between the learning-theoretic capacity when arbitrary “decoding” algorithms are allowed, and notions of computationally-constrained capacity. Finally, a number of open problems and conjectures at the interface of information theory and machine learning will be discussed. ∗Program in Applied and Computational Mathematics, and Department of Electrical Engineering, Princeton University, Princeton, USA, [email protected], www.princeton.edu/∼eabbe †Departments of Electrical Engineering and Computer Science, and Department of Statistics, University of California at Berkeley, Berkeley, USA, [email protected], http://www.cs.berkeley.edu/∼wainwrig.
منابع مشابه
Ieee Information Theory Society Newsletter President's Column from the Editor the 2016 Ieee Technical Field Award Recipients: Information and Communication Complexity Isit 2015 Tutorial Information and Communication Complexity Isit 2015 Tutorial
The study of interactive communication (known as communication complexity in the computer science literature) is one of the most important and successful tools for obtaining unconditional lower bounds in computational complexity. Despite its natural connection to classical communication theory, the usage of information theoretic techniques is relatively new within the study of interactive commu...
متن کاملConcentration of Measure Inequalities and Their Communication and Information-Theoretic Applications
During the last two decades, concentration of measure has been a subject of various exciting developments in convex geometry, functional analysis, statistical physics, high-dimensional statistics, probability theory, information theory, communications and coding theory, computer science, and learning theory. One common theme which emerges in these fields is probabilistic stability: complicated,...
متن کاملDevelopment and Usability Evaluation of an Online Tutorial for “How to Write a Proposal” for Medical Sciences Students
Background and Objective: Considering the importance of learning how to write a proposal for students, this study was performed to develop an online tutorial for “How to write a Proposal” for students and to evaluate its usability. Methods: This study is a developmental research and tool design. “Gamified Online Tutorial based on Self-Determination Theory (GOT-STD) Framework" became the basis f...
متن کاملComparison of efficiency management training using lecturing and small group teaching on learning rate of Nursing and Midwifery student’s
Abstract Introduction: Teaching principles of management is important because it empowers the students in the field of midwifery and nursing. This aspect would improve the quality of care in health system significantly. Therefore, achieving the potential teaching method is great importance. This strategy involves techniques to facilitate the learning process and growth critical thinking in s...
متن کاملA tutorial on machine learning in educational science
Popularity of massive online open courses (MOOCs) allowed educational researchers to address problems which were not accessible few years ago. Although classical statistical techniques still apply, large datasets allow us to discover deeper patterns and to provide more accurate predictions of student’s behaviors and outcomes. The goal of this tutorial is to disseminate knowledge on elementary d...
متن کامل